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Abstract 

World-wide collaboration in high-energy physics (HEP) is a tradition which dates back several decades, 
with scientific publications mostly coauthored by scientists from difi'erent countries. This coauthorship 
phenomenon makes it difficult to identify precisely the "share" of each country in HEP scientific 
production. One year's worth of HEP scientific articles published in peer-reviewed journals is analysed 
and their authors are uniquely assigned to countries. This method allows the first correct estimation 
on a pro rata basis of the share of HEP scientific publishing among several countries and institutions. 
The results provide an interesting insight into the geographical collaborative patterns of the HEP 
community. The HEP publishing landscape is further analysed to provide information on the journals 
favoured by the HEP community and on the geographical variation of their author bases. These results 
provide quantitative input to the ongoing debate on the possible transition of HEP publishing to an 
Open Access model. 
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1 Introduction 



High-energy physics (HEP) is commonly regarded as one of the most international and collaborative 
scientific disciplines. Over the last six decades, large experiments at accelerators of ever-increasing en- 
ergy brought together first dozens, then hundreds and now thousands of scientists from an increasingly 
wider spectrum of countries. Furthermore, theoretical HEP predates by a long time present-day cross- 
border communication as a truly global enterprise. This endeavour was fostered by a long-standing 
tradition of scientific exchange, regular gatherings and long-term visits to several major centres of 
attraction by scientists. 

As a consequence of this well-established and thriving cross-border tradition, coauthorship of HEP 
articles by scientists affiliated to institutes in different countries is the norm rather than the exception. 
At the same time this coauthorship phenomenon complicates bibliometric studies aimed at evaluating 
the relative contributions of different countries to the production of HEP articles. 

This article presents an analysis of the distribution of HEP authorship over several countries and 
institutes, taking into account the coauthorship phenomenon on a pro rata basis. This analysis is 
based on one year's worth of HEP articles, selected as presented in Section [2j Section [3] explains 
the data- analysis procedure and discusses some bibliometric results. Results on the geographical 
distribution of HEP authorship are presented in Section [4] and then interpreted in Section [5] in terms 
of global collaborative patterns. The publishing landscape is investigated in Section [6l which identifies 
the journals most used by HEP authors. Section [7] presents additional results on the breakdown of 
the author base of the leading HEP journals among different countries; the distribution over different 
journals of the HEP scientific production of several countries and institutes is also discussed. 

These results are particularly relevant as they constitute a quantitative basis for the ongoing 
debate on the possible transition of HEP publishing to an Open Access model [1]. No assessment of 
the economical implications of such a transition is possible without clear and uncontroversial data on 
the contributions of different countries to HEP scientific publishing, which is presented here for the 
first time. 

2 Data Sample 

The preprint culture in HEP pioneered the free distribution of scientific results. For decades, theoretical 
physicists and scientific collaborations, eager to disseminate their findings in a way faster than the 
distribution of scholarly publications, printed and mailed hundreds, even thousands, of copies of their 
manuscripts before submitting them to peer-reviewed journals. This preprint culture tended, however, 
to favour the large laboratories and universities that could afford mailing large numbers of preprints 
while receiving comprehensive regular mailings [2] . The spread of the Internet and the inception of the 
arXiv repository [3] ushered a new era for the preprint culture, offering all scientists a level playing 
field. In its current implementation, arXiv allows researchers to submit their preprints and browse or 
receive regular feeds on recent submissions in their area of interest [4]. The arXiv repository and its 
mirrors collect the corpus of HEP articles, classified into four categories: 

• hep-ex, for high energy experimental physics; 

• hep-lat for studies of lattice field theory; 
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• hep-ph for particle phenomenology; 

• hep-th for string, conformal and field theory. 

The attribution of articles to a particular category is performed by the authors themselves at submis- 
sion time. The system supports cross referencing while multiple submission is frowned upon so that 
no double counting of the same article from two categories is expected in the following analysis. 

This analysis is based on all preprints submitted to arXiv in the year 2005 and classified in one of 
the four HEP categories. Owing to its widespread preprint culture, this sample represents a faithful 
snapshot of HEP peer-reviewed scientific literature. 

As in many other disciplines, HEP results are often presented in preliminary form at international 
conferences or workshops before being officially released in the form of a publication in a peer-reviewed 
journal. Results are then often summarised at other conferences in the following years. Preprints 
usually appear describing these conference contributions and therefore arXiv stores multiple, albeit 
different, entries corresponding to different phases of the life-cycle of a scientific result. To avoid this 
form of multiple counting of the same piece of work, the following analysis is restricted to preprints 
subsequently published in peer-reviewed journals. This requirement also removes lecture notes, theses 
and other unpublished material submitted to arXiv but not relevant for this analysis. 

The data on which this analysis is based are extracted from the SPIRES database [5] hosted 
at SLAG, the Stanford Linear Accelerator Center in California, and jointly compiled together with 
DESY, the Deutsches Elektronen-Synchrotron in Hamburg, and FNAL, the Fermi National Accelerator 
Laboratory in Illinois. This database is chosen as it has a complete coverage of the HEP articles in 
arXiv and in addition includes publication information. As an example, the sample of preprints 
STibmittcd to the hep-ex category in arXiv during 2005, and subsequently published, is obtained with 
the following query: 

FIND EPRINT HEP-EX/05# AND PS P AND NOT TYPE C 
AND NOT TYPE L AND NOT TYPE B AND NOT TYPE T 

Conference articles, lecture notes, theses and books are explicitly removed from the search. The 
samples for the other three arXiv categories are obtained mutatis mutandis. 

3 Data Analysis 

Table 1 presents the numbers of hits obtained by the SPIRES query in the four categories and their 
sum for the year 2005 as well as the entire historical record. A total of 5016 articles are selected for 
the year 2005. The total numbers of submissions for each arXiv category obtained with queries such 
as: 

FIND EPRINT HEP-EX/05# 

are also presented in Table 1 together with their sum. The difference with the sample considered in 
this article is composed of conference articles and unpublished material. The ratios of the numbers of 
published articles to the numbers of arXiv submissions is also presented in Table 1. 

The historical evolution of the numbers in Table 1 is interesting: early years show a gradual increase 
in the number of submissions, consistent with the gradual adoption of the system, while numbers for 
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Table 1: Numbers of preprints submitted to the different eirXiv HEP categories (Ns) and subsequently published in peer- 
reviewed journals (Np) together with their total. The ratio e = Ns/Np is also listed. Figures are given for the entire arXiv 
historical sample. Data corresponding to the year 2005 is used in this analysis. 



later years are consistent with a plateau structure with year-to-year variations of a few percentage 
points. 

The queries on which this article is based were performed in the second half of October 2006 
and one could argue that some preprints submitted in late 2005 could have still been in the editorial 
process and would not therefore have yet appeared in peer-reviewed journals. If the five-year period 
2000 — 2004 is used to predict the number of articles extracted by the query for the year 2005, this is 
just 6% above the number actually observed, leading to the conclusion that no large systematic bias 
affects the size of the sample under consideration. There are no reasons to believe that any sizable 
systematic effect from a small fraction of "undiscovered" articles would affect the relative contributions 
of different countries presented in the following. 

Figure 1 presents the distribution among the four different arXiv categories of the 5016 articles 
on which this analysis is based. Experimental results account for just 6.7% of the total. 

Distribution of HEP articles by discipline 
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Figure 1: Distribution by arXiv category of the sample used in this analysis, corresponding to 5016 
preprints submitted in the year 2005 and subsequently published in peer-reviewed journals. 
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Figure 2: Distributions of the number of authors of hep-lat, hep-ph and hep-th articles used in this 
analysis. The distributions are normalised to unit area and their mean is indicated. 
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A first bibliometric result extracted from this study is the distribution of the number of authors 
per article. Figure 2 presents the distribution of the number of authors of each article in the three non- 
experimental classes hep-lat, hep-ph and hep-th. The average number of authors for the three classes 
are 3.6, 2.9 and 2.3, respectively. The average number of authors for the sum of the three classes is 
2.6. The average number of authors for the hep-ex class is about 290. The distribution of the number 
of authors is biased by the fact that a dozen large experimental collaborations appear several times in 
the data sample. The breakdown of the considered arXiv : hep-ex sample into different experiments 
is shown in Figure 3. Implications of the large number of authors in experimental collaborations are 
discussed in Reference [6]. 

Distribution of experimental articles by collaborations 




Figure 3: Number of articles from the large experimental collaborations submitted to arXiv: hep-ex 
in 2005 and subsequently published in peer-reviewed journals. The "Other" category comprises col- 
laborations which published less than 4 articles as well as articles with less than 40 authors. The total 
number of articles is 338. 
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Unfortunately, as of today, no database allows an automatic extraction of bibliographic information 
concerning author affiliations for HEP articles at the level needed for this analysis. Therefore each 
article satisfying the query had to be inspected to perform a manual classification of the authors 
according to their affiliation. The output format of SPIRES partly alleviates this problem as author 
affiliations are often readable off the standard web-based output of the queries without having to access 
the article metadata on a publisher's web site or the full-text version in arXiv. Author affiliations 
were classified into 22 classes, listed in the first column of Table 2. European, American and Asian 
countries are singled out according to their contribution to the global HEP scientific production, 
down to a lower limit of about 1%. The contribution from CERN, the world's largest HEP laboratory, 
is shown separately. The remaining countries are divided into two classes: CERN Member Stateqj 
and the remaining countries. As the vast majority of HEP in Italy is funded by INFN, the Istituto 
Nazionale di Fisica Nucleare, its contribution has been considered in lieu of the Italian one. Italian 
authors without an INFN affiliation are counted in the "Other Member States" category. 

As mentioned above, medium- and long-term visits of authors to different institutes and major 
laboratories is the staple diet of the HEP collaborative soul. As a consequence, authors of HEP articles 
often have multiple affiliations. Three principles to assign authors with multiple affiliations to a single 
class are followed in the order they are presented below. 

1. If one of the multiple affiliations of an author is a HEP laboratory, the author is assigned to that 
laboratory in the case of CERN, or to the host nation of the laboratory in the other cases. 

2. If only one of the multiple affiliations of an author corresponds to one of the countries explicitly 
singled out for the analysis, the author is assigned to that country. 

3. If more than one of the multiple affiliations of an author corresponds to one of the countries 
explicitly singled out for the analysis, the author is assigned to a country or institution, according 
to an indicator which takes into account their pro- capita Gross Domestic Product and their 
expected share of the HEP scientific production. 

4 Distribution of the HEP Production by Country 

The first result of this analysis is the calculation of the share of HEP publications authored by each 
of the 22 countries and institutions into which the authors are classified. For each article in one of the 
four arXiv categories, each of the 22 countries and institutions is attributed a fraction of the article 
corresponding to the number of authors associated to that country, divided by the total number of 
authors. The sum of these fractions over all the articles of an arXiv category, divided by the total 
number of articles in that category, defines the share of a particular country or institution. The results 
are listed in Table 2 for the four arXiv categories as well as for their average. Figure 4 presents the 
distribution of the HEP scientific production over different countries. To our knowledge, this is the 
first result on the distribution of the HEP scientific literature by country where the phenomenon of 
coauthorship is taken into account. 

It is interesting to combine the results presented in Table 2 into the three largest sections of HEP 
authorship: CERN and its Member States, the United States, and the remaining countries. These 

^CERN Member States not already listed in the first column of Table 2 are: Austria, Belgium, Bulgaria, the Czech 
Republic, Denmark, Finland, Greece, Hungary, Norway, Poland and the Slovak Republic. 
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results are presented in Table 3 for the four arXiv classes and their average. Figures 5 and 6 show a 
summary of the distributions of HEP authorship for the arXiv classes and their average, respectively. 
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Table 2: Distribution of HEP scientific literature over different countries and institutions for the four 
HEP arXiv classes and their average. 
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Table 3: Distribution of HEP scientific production over three geographical groups for the four HEP 
arXiv classes and their average. 
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Distribution of HEP articles by country 




Figure 4: Distribution of the HEP scientific literature over different countries and institutions. A sample of 5016 articles 
submitted to arXiv in 2005 and subsequently published in peer-reviewed journals is considered. Coauthorship is taken into 
account by assigning fractions of articles to different countries on a pro-rata basis. 
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Distribution of HEP articles by country: hep-ph 



Distribution of HEP articies by country: hep-th 
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;ure 5: Distribution of HEP scientific production over three geographical groups for the four arXiv 
sses. 
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Distribution of HEP articles by country 
hep-ex, hep-lat, hep-ph, hep-th 
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Figure 6: Distribution of HEP scientific production over three geographical groups. 
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5 Collaborative Patterns in HEP 



The data sample under investigation allows a study of the collaborative patterns in HEP in order 
to answer a natural question: which groups of countries and institutions collaborate? A simplified 
approach to address this question is chosen, in which only three large groups of authors are considered, 
according to their affiliation to one of three sections of HEP authorship: CERN and its Member States, 
the United States, and the remaining countries. Results from more complex analyses of other data 
samples focusing on author-to-author collaborative networks are presented in Reference [7]. Each 
article is assigned to one of seven mutually-exclusive classes: 

1. all the authors are associated to CERN or any of its Member States; 

2. all the authors are associated to the United States; 

3. no authors are associated to CERN, its Member States or the United States; 

4. some authors are associated to CERN or one of its Member States and some to the United 
States, but none to any other country; 

5. some authors are associated to CERN or one of its Member States and some to other countries, 
but none to the United States; 

6. some authors are associated to the United States and some to other countries but none to CERN 
or any of its Member States; 

7. at least one author is associated to CERN or one of its Member States, one to the United States 
and one to some other country. 

Figure 7 presents the fraction of HEP articles in each of these seven classes while Figure 8 shows the 
results for the four separate arXiv disciplines. 

6 Distribution of HEP Publications among Journals 

The 5016 articles considered in this study appeared in 89 different peer-reviewed journals. The dis- 
tribTition of articles over the different journals is presented in Tabic 4 for the four different HEP 
disciplines and their global average, which is also shown in Figure 9. Only the 11 journals with a share 
above 1% are considered in Table 4 and Figure 9. However, the share of Nuclear Instruments and 
Methods in Physics Research (NIM) is also singled out. The contribution to this journal is interesting 
as this title is the reference journal for instrumentation in HEP. The low share of this journal in the 
total is due to the reduced contribution of experimental HEP to the total production compared to 
the theoretical and phenomenological studies, as presented in Figure 1. However, the low percentage 
of instrumentation articles among the total amount of experimental articles, 2.7%, is also due to the 
far less widespread culture of self- archiving results in arXiv in the HEP instrumentation community. 
A direct inspection of articles published in NIM in 2005 revealed about 30% of articles of potential 
interest for HEP instrumentation which had not been submitted to arXiv, neither before nor after 
publication. 
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CERN & Member States 53% United States 




Other Countries 



Figure 7: Collaborative patterns in HEP. Numbers in the circles at the vertices of the triangle represent 
the percentages of articles produced by individual authors or authors collaborating with others within 
the same group of countries and institutions. Numbers in the dashed circles along the sides of the 
triangle represent the percentages of articles produced by collaborations of authors from countries and 
institutions in the two groups indicated by the neighbouring vertices. The number in the dashed circle 
in the centre of the triangle represents the articles produced by collaborations spanning the three 
groups of countries. The plot presents results for the entire HEP production submitted to arXiv in 
2005 and subsequently published. 
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Figure 8: Collaborative patterns in HEP. Collaborative patterns in HEP. Numbers in the circles at the 
vertices of the triangle represent the percentages of articles produced by individual authors or authors 
collaborating with others within the same group of countries and institutions. Numbers in the dashed 
circles along the sides of the triangle represent the percentages of articles produced by collaborations 
of authors from countries and institutions in the two groups indicated by the neighbouring vertices. 
The numbers in the dashed circle in the centre of the triangle represents the articles produced by 
collaborations spanning the three groups of countries. The plot presents the results for each of the 
four disciplines in which arXiv preprints are classified by the authors. 
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Distribution of HEP articles by journal 




CERN Scientific Information Service 



Figure 9: Distribution of the HEP articles in several journals. Only journals with a total share above 1% are considered, with 
the exception of Nuclear Instruments and Methods in Physics Research (NIM). The remaining 77 journals are grouped under 
"Others" . Journals are ordered clockwise according to decreasing shares. A total of 83% of HEP articles is published in just 
six journals. 



An analysis of the results in Table 4 shows that 83% of HEP articles are published in just six 
journals: Physical Review (A through E); Journal of High Energy Physics (JHEP); Physics Letters 
(A and B); Nuclear Physics (A and B); Physical Review Letters and the European Physical Journal 
(A and C). 
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Table 4: Distribution of HEP articles over different journals for the four HEP arXiv classes and their 
average. Only journals with a total share above 1% are considered, with the exception of Nuclear 
Instruments and Methods in Physics Research (NIM). The remaining 77 journals are grouped under 
"Others" . The publishers of the different journals are also indicated. 

These six journals are published by just four publishers: the American Physical Society, Elsevier, 
SISSA and Springer, as detailed in Table 4. It is interesting to split the corpus of HEP scientific liter- 
ature discussed in this article according to the publisher of the journal in which the article appeared. 
The results are presented in Figure 10. A total of 87% of HEP articles are published by the same four 
publishers listed above. 

7 Geographical Analysis of HEP Journals 

The quantitative information on the different countries and institutions contributing to each of the 
HEP articles considered in this analysis allows the estimation of the geographical distribution of the 
authors for each of the 12 journals listed in Table 4. The analysis of Section S] is repeated for each 
journal and the results are presented in Table 5 for all 22 countries and institutions considered in 
this article, as well as their grouping into three sections: CERN and its Member States, the United 
States, and the remaining countries. Figures 11 and 12 present these results in graphical form, with 
the contributions from CERN and its Member States grouped. 

In addition to the geographical distribution of the authors for the major HEP journals, it is 
interesting to identify the most popular journals of the single countries and institutions considered in 
this analysis. To extract this information, all articles with at least one author from a given country 
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Distribution of HEP articles by publisher 
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Figure 10: Distribution of HEP articles over different publishers. A total of 87% of HEP articles are 
published by four publishers: APS, Elsevier, SISSA and Springer. 



or institution are first selected. Then, the fraction of authorship of this country or institution is 
calculated for each article. This fraction is assigned to the journal where the article appeared. The 
sum of all these fractions for each journal provides a score of the popularity of the journal. If the sum 
of these scores is used to measure the total HEP scientific production of the country, it can be used to 
normalise each score and obtain the fractions of the HEP production of the country in the different 
journals. The results of this study are presented in Table 6 for each of the 22 countries and institutions 
discussed in this article. The last three lines of Table 6 present the results summed over three groups: 
CERN and its Member States, the United States and the remaining countries. The results for these 
groups are presented in Figure 13. Figure 14 and 15 present results for some European countries and 
institutions and Figure 16 presents results for some of the remaining countries. 



8 Conclusions, with a Note on Open Access 

This article presents the results of the first bibliometric study of HEP publishing which accounts for 
the widespread phenomenon of coauthorship. The share of HEP scientific results published by several 
countries and institutions is correctly calculated and provides interesting insight into the collaborative 
patterns within the HEP community. The publishing landscape of HEP is further analysed to provide 
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information on the journals most used by the HEP community and on the geographical distribution 
of their authors. 

It is interesting to put these results into the wider context of a possible transition of HEP publishing 
to an Open Access model [1]. The finding that 83% of HEP articles are published in just six journals 
and that 87% of the articles appear in journals published by just four publishers is particularly 
interesting. It demonstrates that the number of partners to be engaged with in a debate on a change 
of the HEP publishing model is relatively small. The worldwide collaborative patterns in HEP, which 
are quantified in this article, suggest that once a limited number of countries embrace an Open 
Access publishing model, a "domino effect" likely to spread this policy to other countries, through 
coauthorship links. Last, but not least, the assessment of the relative contribution to the worldwide 
production of HEP scientific results which takes into account the coauthorship phenomenon, presented 
in Table 2 and Figure 4, might constitute the basis for a model where each country or institution would 
contribute with their "fair share" towards the financial cost of Open Access publishing. 
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Table 5: Geographical distribution of the authors of HEP journals. The lower part of the table summarises the results for 
three sections of the HEP community: CERN and its Member States, the United States, and the remaining countries. 
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Table 6: Distribution of each country's HEP articles in several journals. The lower part of the table summarises the results 
for CERN and its Member States, the United States, and all remaining countries. 






Figure 11: Geographical distribution of HEP authors by journals. 
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Figure 12: Geographical distribution of HEP authors by journals. 
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Figure 13: Distribution of HEP articles in different journals for three country groups. 
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Figure 14: Distribution of HEP articles in different journals for several European countries. 
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;ure 15: Distribution of HEP articles in different journals for several European countries. 
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Figure 16: Distribution of HEP articles in different journals for several countries. 
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